A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings
نویسندگان
چکیده
a r t i c l e i n f o a b s t r a c t Let X and Y be two strings of lengths n and m, respectively, and k and l, respectively, be the numbers of runs in their corresponding run-length encoded forms. We propose a simple algorithm for computing the longest common subsequence of two given strings X and Y in O (kl + min{p 1 , p 2 }) time, where p 1 and p 2 denote the numbers of elements in the bottom and right boundaries of the matched blocks, respectively. It improves the previously known time bound O (min{nl, km}) and outperforms the time bounds O (kl log kl) or O ((k + l + q) log(k + l + q)) for some cases, where q denotes the number of matched blocks.
منابع مشابه
Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism
Data compression can be used to simultaneously reduce memory, communication and computation requirements of string comparison. In this paper we address the problem of computing the length of the longest common subsequence (LCS) between run-length-encoded (RLE) strings. We exploit RLE both to reduce the complexity of LCS computation from O(M × N) to O(mN + Mn − mn), where M and N are the lengths...
متن کاملMatching for Run-Length Encoded Strings
1 Motivation Measuring the similarity between two strings, through such standard measures as Hamming distance, edit distance, and longest common subsequence, is one of the fundamental problems in pattern matching. We consider the problem of nding the longest common subsequence of two strings. A well-known dynamic programming algorithm computes the longest common subsequence of strings X and Y i...
متن کاملFast Algorithms for Computing the Constrained LCS of Run-Length Encoded Strings
In the constrained longest common subsequence (CLCS) problem, we are given two sequences X , Y and the constrained sequence P in run-length encoded (RLE) format, where |X| = n, |Y | = m and |P | = r and the numbers of runs in RLE format are N , M and R, respectively. In this paper, we show that after the sequences are encoded, the CLCS problem can be solved in O(NMr+ r × min{q1, q2} + q3) time,...
متن کاملDevelopment of Cache Oblivious Based Fast Multiple Longest Common Subsequence Technique(CMLCS) for Biological Sequences Prediction
A biological sequence is a single, continuous molecule of nucleic acid or protein. Classical methods for the Multiple Longest Common Subsequence problem (MLCS) problem are based on dynamic programming. The Multiple Longest Common Subsequence problem (MLCS) is used to find the longest subsequence shared between two or more strings. For over 30 years, significant efforts have been made to find ef...
متن کاملFinding a longest common subsequence between a run-length-encoded string and an uncompressed string
In this paper, we propose anO(min{mN,Mn}) time algorithm for finding a longest common subsequence of stringsX and Y with lengthsM andN , respectively, and run-length-encoded lengthsm and n, respectively. We propose a new recursive formula for finding a longest common subsequence of Y and X which is in the run-length-encoded format. That is, Y=y1y2 · · · yN andX=r1 1 r2 2 · · · rm m , where ri i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Lett.
دوره 108 شماره
صفحات -
تاریخ انتشار 2008